Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

TREC-5 experiments at Dublin City University : Query space reduction, Spanish & character shape encoding

Identifieur interne : 002571 ( Main/Exploration ); précédent : 002570; suivant : 002572

TREC-5 experiments at Dublin City University : Query space reduction, Spanish & character shape encoding

Auteurs : F. Kelledy [Irlande (pays)] ; A. F. Smeaton [Irlande (pays)]

Source :

RBID : Pascal:98-0258326

Descripteurs français

English descriptors

Abstract

In this paper we describe work done as part of the TREC-5 benchmarking exercise by a team from Dublin City University. In TREC-5 we had three activities as follows: 1) Our ad hoc submissions employ Query Space Reduction techniques which attempt to minimise the amount of data processed by an IR search engine during the retrieval process. We submitted four runs for evaluation, two automatic and two manual with one automatic run and one manual run employing our Query Space Reduction techniques. The paper reports our findings in terms of retrieval effectiveness and also in terms of the savings we make in execution time. 2) Our submission to the multi-lingual track (Spanish) in TREC-5 involves evaluating the performance of a new stemming algorithm for Spanish developed by Martin Porter. We submitted threee runs for evaluation, two automatic, and one manual, involving a manual expansion from retrieved documents. 3) Character shape coding (CSC) is a technique for representing scanned text using a much reduced alphabet. It has been developed by Larry Spitz of Daimler Benz as an alternative to full-scale OCR for paper documents. Some of our TREC-5 experiments have started evaluating the performance of a CSC representation of scanned documents for information retrieval and this paper outlines our future work in this area


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">TREC-5 experiments at Dublin City University : Query space reduction, Spanish & character shape encoding</title>
<author>
<name sortKey="Kelledy, F" sort="Kelledy, F" uniqKey="Kelledy F" first="F." last="Kelledy">F. Kelledy</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>School of Computer Applications, Dublin City University</s1>
<s2>Glasnevin, Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Glasnevin, Dublin</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Smeaton, A F" sort="Smeaton, A F" uniqKey="Smeaton A" first="A. F." last="Smeaton">A. F. Smeaton</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>School of Computer Applications, Dublin City University</s1>
<s2>Glasnevin, Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Glasnevin, Dublin</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">98-0258326</idno>
<date when="1997">1997</date>
<idno type="stanalyst">PASCAL 98-0258326 INIST</idno>
<idno type="RBID">Pascal:98-0258326</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000892</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B05</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000874</idno>
<idno type="wicri:doubleKey">1048-776X:1997:Kelledy F:trec:experiments:at</idno>
<idno type="wicri:Area/Main/Merge">002705</idno>
<idno type="wicri:Area/Main/Curation">002571</idno>
<idno type="wicri:Area/Main/Exploration">002571</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">TREC-5 experiments at Dublin City University : Query space reduction, Spanish & character shape encoding</title>
<author>
<name sortKey="Kelledy, F" sort="Kelledy, F" uniqKey="Kelledy F" first="F." last="Kelledy">F. Kelledy</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>School of Computer Applications, Dublin City University</s1>
<s2>Glasnevin, Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Glasnevin, Dublin</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Smeaton, A F" sort="Smeaton, A F" uniqKey="Smeaton A" first="A. F." last="Smeaton">A. F. Smeaton</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>School of Computer Applications, Dublin City University</s1>
<s2>Glasnevin, Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Glasnevin, Dublin</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">NIST special publication</title>
<title level="j" type="abbreviated">NIST spec. publ.</title>
<idno type="ISSN">1048-776X</idno>
<imprint>
<date when="1997">1997</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">NIST special publication</title>
<title level="j" type="abbreviated">NIST spec. publ.</title>
<idno type="ISSN">1048-776X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automated processing</term>
<term>Character recognition</term>
<term>Duration</term>
<term>Information retrieval</term>
<term>Ireland</term>
<term>Manual processing</term>
<term>Query</term>
<term>Query formulation</term>
<term>Question processing</term>
<term>Relevance</term>
<term>Spanish</term>
<term>Threshold</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Recherche information</term>
<term>Question documentaire</term>
<term>Formulation question</term>
<term>Seuil</term>
<term>Durée</term>
<term>Traitement manuel</term>
<term>Traitement automatisé</term>
<term>Espagnol</term>
<term>Reconnaissance caractère</term>
<term>Pertinence</term>
<term>Irlande</term>
<term>Dublin</term>
<term>QTT (Query Term Thresholding)</term>
<term>CSC (Character Shape Coding)</term>
<term>QSP (Query Space Reduction)</term>
<term>Traitement question</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper we describe work done as part of the TREC-5 benchmarking exercise by a team from Dublin City University. In TREC-5 we had three activities as follows: 1) Our ad hoc submissions employ Query Space Reduction techniques which attempt to minimise the amount of data processed by an IR search engine during the retrieval process. We submitted four runs for evaluation, two automatic and two manual with one automatic run and one manual run employing our Query Space Reduction techniques. The paper reports our findings in terms of retrieval effectiveness and also in terms of the savings we make in execution time. 2) Our submission to the multi-lingual track (Spanish) in TREC-5 involves evaluating the performance of a new stemming algorithm for Spanish developed by Martin Porter. We submitted threee runs for evaluation, two automatic, and one manual, involving a manual expansion from retrieved documents. 3) Character shape coding (CSC) is a technique for representing scanned text using a much reduced alphabet. It has been developed by Larry Spitz of Daimler Benz as an alternative to full-scale OCR for paper documents. Some of our TREC-5 experiments have started evaluating the performance of a CSC representation of scanned documents for information retrieval and this paper outlines our future work in this area</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Irlande (pays)</li>
</country>
</list>
<tree>
<country name="Irlande (pays)">
<noRegion>
<name sortKey="Kelledy, F" sort="Kelledy, F" uniqKey="Kelledy F" first="F." last="Kelledy">F. Kelledy</name>
</noRegion>
<name sortKey="Smeaton, A F" sort="Smeaton, A F" uniqKey="Smeaton A" first="A. F." last="Smeaton">A. F. Smeaton</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002571 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002571 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:98-0258326
   |texte=   TREC-5 experiments at Dublin City University : Query space reduction, Spanish & character shape encoding
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024